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We discuss the coherence properties of Expected Shortfall (ES) as a financial risk mea- 
sure. This statistic arises in a natural way from the estimation of the "average of the 100p% 
worst losses" in a sample of returns to a portfolio. Here p is some fixed confidence level. We 
also compare several alternative representations of ES which turn out to be more appropriate 
for certain purposes. 
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Risk professionals have been looking for a coherent alternative to Value at Risk (VaR) for 
four years. Since the appearance, in 1997, of Thinking Coherently by Artzner et al || followed 
by Coherent Measures of Risk [Q], it was clear to risk practitioners and researchers that the 
gap between market practice and theoretical progress had suddenly widened enormously. These 
papers in fact faced for the first time the problem of defining in a clearcut way what properties a 
statistic of a portfolio should have in order to be considered a sensible risk measure. The answer to 
this question was given through a complete characterization of such properties via an axiomatic 
formulation of the concept of coherent risk measure. With this result, risk management became 
all of a sudden a science in itself with its own rules correctly defined in a deductive framework. 
Surprisingly enough, however, VaR, the risk measure adopted as best practice by essentially all 
banks and regulators, happened to fail the exam for being admitted in this science. VaR is not 
a coherent risk measure because it simply doesn't fulfill one of the axioms of coherence. 
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Things are worse than some people strive to believe. The fact that for years the class of coherent 
measures didn't exhibit any known specimen that shared with VaR its formidable advantages 
(simplicity, wide applicability, universality,. . . ) led many practitioners to think that coherence 
might be some sort of optional property that a risk measure can or cannot display. It seemed 
that coherent measures belonged to some ideal world which real-world practical risk measures 
can only dream of. So little attention was paid to this problem that to the best of our knowledge 
no risk management textbook ever mentioned the fact that VaR is not coherent. 

This attitude means underestimating the impact of the conclusions of [[| . Writing axioms means 
crystallizing in a minimal number of precise statements the intrinsic nature of a concept. It is 
a necessary step to take in the process of translating a complex reality into a mathematical 
formulation. The axioms of coherence simply embody in a synthetic and essential way those 
features that single out a risk measure in the class of statistics of a portfolio dynamics, just 
like the axiom "it must be higher when air is hotter" identifies a measure of temperature out of 
the class of thermodynamical properties of the atmosphere. If you want to use a barometer for 
measuring temperature despite the fact that pressure does not satisfy the above axiom, don't 
be surprised if you happen to be dressed like an Eskimo in a hot cloudy day or to be wearing a 
swim costume in an icy sunshine. 

Broken axioms always lead to paradoxical, wrong results. And VaR makes no exception. Once 
you know which axiom is violated by VaR it is a child's play to provide examples where the 
assessment of risks via VaR is definitely wrong or, in other words, where higher VaR figures 
come from less risky portfolios Q. 

In this paper and henceforth we are going to take these axioms seriously as many other groups 
of researchers j|, [|, [| 0, [| practitioners and regulators [|ll]] have begun to do. To avoid 
confusion, if a measure is not coherent we just choose not to call it a risk measure at all. 
In other words, for us, the above-mentioned axioms define the concept of risk itself via the 
characterization of the possible operative ways to measure itQ. This might seem a dogmatic 
approach but it is not. We are of course prepared to give up this definition as soon as a new 
different set of axioms is proposed which is more suitable to a mathematical formulation of the 
concept of risk measure. What we are not prepared to do anymore, after we learned the lesson 
of H|, ||, is discussing of risk measures without even defining what "risk measure" means. 

We therefore promote the coherence axioms of || to key defining properties of any risk measure 
to clearly state that in our opinion speaking of non-coherent measures of risk is as useless and 
dangerous as speaking of non-coherent measures of temperature. In our language, the adjective 
coherent is simply redundant. 

Definition 1 (Risk Measure) Consider a set V of real-valued random variables. A function 

1 Note that this was indeed the genuine motivation of Q. Quoting from the introduction: We provide in 
this paper a definition of risk . . . and provide and justify a unified framework for the analysis, construction and 
implementation of measures of risk. 
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p : V — > R zs called a risk measure if it is 

(i) monotonous: X <E V, X > => < 0, 

(«; sub-additive: X,Y,X + Y eV p(X + Y) < p(X) + p(F), 

(m^ positively homogeneous: X & V, h > 0, hX GV => p(hX) = hp[X), and 

(iv) translation invariant: X € V, a € R =4* p(X + a) = p(-X') — a. 

VaR is not a risk measure because it does not fulfill the axiom of sub-additivity. This property 
expresses the fact that a portfolio made of sub-portfolios will risk an amount which is at most the 
sum of the separate amounts risked by its sub-portfolios. This is maybe the most characterizing 
feature of a risk measure, something which belongs to everybody's concept of risk. The global 
risk of a portfolio will be the sum of the risks of its parts only in the case when the latter 
can be triggered by concurrent events, namely if the sources of these risks may conspire to 
act altogether. In all other cases, the global risk of the portfolio will be strictly less than the 
sum of its partial risks thanks to risk diversification. This axiom captures the essence of how a 
risk measure should behave under the composition/addition of portfolios. It is the key test for 
checking whether a measurement of a portfolio's risk is consistent with those of its parts. 

For a sub-additive measure, portfolio diversification always leads to risk reduction, while for 
measures which violate this axiom, diversification may produce an increase in their value even 
when partial risks are triggered by mutually exclusive events fij. 

Sub-additivity is necessary for capital adequacy requirements in banking supervision. Think of 
a bank made of several branches: if the capital requirement of each branch is dimensioned on 
its own risk, the regulator should be confident that also the overall bank capital should be an 
adequate one. This may however not be the case if the adopted measure violates sub-additivity 
since the risk of the whole bank could turn out to be much bigger then the sum of the branches' 
risks. 

Sub-additivity is an essential property also in portfolio-optimization problems. This property in 
fact is related to the convexity^ of the risk surface to be minimized in the space of portfolios. 
Only if the surfaces are convex they will always be endowed with a unique absolute minimum 



and no fake local minima |1C] and the risk minimization process will always pick-up a unique, 



well-diversified optimal solution. 

Therefore, though one can perfectly think of possible alternative axiomatic definitions of risk 
measure, we strongly believe that no sensible set of axioms could in any case admit sub-additivity 
violations. 



2 In fact convexity follows from sub-additivity and positive homogeneity of definition |l]. 
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2 Constructing a Risk Measure 



In what follows we want to show how a coherent alternative to Value at Risk arises as the 
natural answer to simple questions on a specified sample of worst cases of a distribution. We 
will construct this measure in a bottom-up fashion to better appreciate that this construction 
does not leave much freedom and leads in a natural way to essentially one robust solution. 

For sake of concreteness, X will be the random variable describing the future value of the profit 
or loss of a portfolio on some fixed time horizon T from today and a = A% € (0, 1) will be some 
percentage which represents a sample of "worst cases" for the portfolio that we want to analyze. 
Provided this information, the VaR of the portfolio with parameters T and A% is simply given 
by the loss associated with the related quantile x^ of the distribution^]. 



(1) x {a) {X) = sup{x|P[X < x] < a} 

(2) VaR^(X) = -x( a \X) 

This statistic answers to the following question: 

(3) What is the minimum loss incurred in the A% worst cases of our portfolio? 



Strange as it may sound, this is the most frequently asked question in financial risk management 
today. And due to that "minimum loss" in its definition VaR is not a sub-additive measure. 
Moreover, being simply the threshold of the possible A% losses, VaR is indifferent of how serious 
the losses beyond that threshold actually are. Little imagination is needed to invent portfolios 
with identical VaR and dramatically different levels of risk in the same A% worst cases sample. 

Any reader, at this point is tempted to modify the above question with the following: 

(4) What is the expected loss incurred in the A% worst cases of our portfolio? 

We want to show that this is a good idea for at least two different reasons. First of all, because 
this question is undoubtedly a more natural question to raise when considering the risks of 
a specified sample of worst cases. Secondly, because it naturally leads to the definition of a 
sub- additive statistic as we will see in a few steps. 

It is not difficult to understand that if the distribution function of the portfolio is continuous, 
then the statistic which answers the above question is simply given by a conditional expected 
value below the quantile or "tail conditional expectation" [||. 

(5) TCE^iX) = -E{X\X <x^} 

For more general distributions however, this statistic does not fit question (||) since the event 
{X < x(«)} may happen to have a probability larger than A% and is therefore larger than our 
3 We will omit the T dependence where possible. 
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set of selected worst cases. Indeed, TCE is a risk measure only when restricted to continuous 
distribution functions || while may violate sub-additivity on general distributions 0. 

To understand which statistic is actually hidden in question (|j) let us see how we would naturally 
answer to it having a large number n of realizations {^Q}{i=i v .. in } °f the random variable X. We 
simply have to sort the sample in increasing order and average the first A% values. To do this, 
define the order statistics X\- n < . . . < X n:n as the sorted values of the n-tuple (Xi, . . . , X n ) and 
approximate the number of A% elements in the sample by w = [na] = max{m|m < na, m 6 N}, 
the integer part of nA%, a choice that for large n could be changed with any other integer 
rounding or truncation close to na. The set of A% worst cases is therefore represented by the 
least w outcomes {X\ :n , . . . , X w:n }. 

Postponing the discussion of some subtleties on quantile estimation we can define the following 
natural estimator for the a-quantile x^ a ' . 



'-w.n 



(6) x^{X)=X u 
The natural estimator for the expected loss in the A% worst cases is then simply given by 

(7) ES ( n a) {X) = - * =1 i:n = -(Average of least A% outcomes Xi) 



w 



which we will call the A% Expected Shortfall of the sample. Note that then the natural estimator 
for TCE 

y^ n x i 

(8) TCE& (X) = - ' {x,<x w .. n } = _ ( Ayerage Qf all x . < x {a) ) 

is in general an average of more than A% of the outcomes)^. This may happen when the prob- 
ability of the event X = x^ is positive (the case of a discrete distribution function) so that 
there might be multiple occurrences of the value Xi = X w . n . 

It is easy to see that ES^ is indeed sub-additive for any fixed n. Consider two variables X and 
Y and a number n of simultaneous realizations {(Xi, Y)}{i=i...., n }- We can prove sub-additivity 
at a glance 



(9) 



ESt\x + Y) = - ^= l(X + y)t 



w 



< 



W 

= ES n a \x)+ESt ] (Y) 

This result is very encouraging. If we understand which statistic ES n °^ is an estimator of for 
large n, we are likely to end up with a sub-additive measure. Notice, that a proof similar to (^) 
would fail for TCE n a) . 



4 We adopt the obvious notation l{R iation} 



1 , if Relation is true 
, if Relation is false. 
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Now, we can expand the definition of ES^ 



ES^ 



i=l _ i=l 

w w 



= ~w \ ^2 Xi - nl {X i:n <X w:n } - ^2 X i:n\^{X iin <X w:n } ~ 1 {i<w} 
\i=l i=l 

■y ( n n 

= ~w ( ^2 Xi 1 {Xi<Xw.n} ~ X w:n^2 { 1 {X l:n <X w:n } ~ l{i<w] 
\i=l i=l 

/ -i n -i n 

n / 1 x - / 1 \ - w\ 

( 10 ) = --\-l^ X i 1 {X l <X w , n }-X w , n ^-^l {Xi < Xw . n] --J 



i=l i=l 



If we now had 

(11) lim X w .. n = x&> 



n—*oo 



with probability 1, it would be easy to conclude that with probability 1 we also have 
(12) ]5m_ESM(X) = ~(E[Xl {x < sW} ]-iW (P[X < x^) - a 



Well, this is the subtlety on quantile estimation we have mentioned. Equation ( |TTD does not hold 
in general. Nevertheless it can be shown g that eq. (||) is more robust and in fact holds in full 
generality. We can then give the following 

Definition 2 (Expected Shortfall) Let X be the profit-loss of a portfolio on a specified time 
horizon T and let a = A% G (0,1) some specified probability level. The Expected A% Shortfall 
of the portfolio is then defined as 

(13) ES^(X) = -1 (E[X 1 {x < x(q)} ] - *W (PLY < x^} - a) 

This definition provides a risk measure perfectly satisfying all the axioms of definition |]. This 
explicit formulation was first introduced^ in jjj where a general proof of sub-additivity0 was also 
given which is not based on the n — > oo limit of the above proof (^) of sub-additivity of ES^ . 
An implicit formulation of ES had already been proved to be coherent in [|/J , where however it 
was erroneously identified with TCE. 

Equation (|D|) might at a first glance look complicated. The concept it expresses is however 
simple as it is the literal mathematical translation of our above natural question and the limit 
for large n of the straightforward estimator ([/]). It is easy to realize that TCE, despite its simpler 

5 This definition may seem different from |l[ for the use of upper quantile instead of lower quantile X( a )- 
It can be shown |^] that the two formulations actually coincide. 
6 It is immediate to verify the other axioms. 
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mathematical formulation (|5|) is on the contrary related to a much more complicated question 
than (||). 

To have a better insight of (|l3|), the term x^ (P[X < x^] — a) has to be interpreted as the 
exceeding part to be subtracted from the expected value ~E[X ljx<x( a )}] wnen {X — x^ a '} has 
probability larger than a = A%. When, on the contrary P[X < x^] = a, as is always the case 
if the probability distribution is continuous, the term vanishes and it is easy to see that (Jl3p 
reduces to (M) or, in other words, ES^ = TCE^. 

The actual simplicity of ES^ a ' can be appreciated only giving up defining it as a combination 
of expected values. There exists in fact an equivalent representation to ( |I3| ) which reveals in a 
much more transparent way the direct dependence on the parameter a and on the distribution 
function F(x) = P[X < a?]. In fact, introducing the so-called generalized inverse function of 
F{x) 

(14) F*~(p) = M{x\F(x) > p] 

one can easily show || that ES^ can be simply expressed as the negative mean of F*~(p) on 
the confidence level interval p £ (0, a]: 

(15) ES {a) {X) = -- T F^{p)dp 

a Jo 

This is the most fundamental formulation of ES^ a \ Its mathematical tractability makes it 
particularly appropriate for studying the analytical properties of ES^ . For instance, continuity 
in a (which is a distinguishing property of ES^ which TCE^ and VaR^ a ' do not share) is 
manifest in (Q5[) while it is not obvious in (O). 



An alternative useful expression equivalent to (^) has been recently formulated in || where the 
terminology^ "a-Conditional Value at Risk" is however adopted for ES^ 

(16) ES {a) = TCE^ + (A - 1) (TCE^ - VaR^) 

with A = Ppf < x^]/a > 1. This relationship, which can be easily derived from ( |l3| ) multiplying 
and dividing by P[X < x^], allows to put in evidence that in general ES^ a ' > TCE^ a \ 



3 Conclusions 

We started with an impasse coming from the fact that VaR was manifestly shown to be unfit 

for describing the risks of a portfolio and yet no valid practical alternative was still available in 

the class of eligible measures of risk. 

7 We prefer the terminology "Expected A% Shortfall" because it is more suitable to the "expected loss in the A% 
worst cases" while "a-Conditional Value at Risk" is a name that had been tailored on the conditional expectation 
of definition Q . 
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In this article we have seen that at least one specimen of the class of coherent risk measures 
allows us not to give up any of the advantages people got used to after the advent of VaR. ES is 
in fact universal: it can be applied to any instrument and to any underlying source of risk. ES 
is complete: it produces a unique global assessment for portfolios exposed to different sources 
of risk. ES is (even more than VaR) a simple concept since it is the answer to a natural and 
legitimate question on the risks run by a portfolio. Furthermore, any bank that has a VaR-based 
Risk Management system could switch to ES with virtually no additional computational effort. 

Even though a lot of work has still to be done to better investigate the statistical, probabilistic 
and computational issues raised by the use of ES, we believe that no serious difficulty will be 
encountered in adapting to it all the techniques developed in recent years for efficient calculations 
of VaR. 
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